-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] add different modes of descriptor loading to RNTupleSerializer #17541
Conversation
Test Results 18 files 18 suites 5d 4h 24m 45s ⏱️ For more details on these failures, see this check. Results for commit 07f170b. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle looks good to me! Except minor comments, I think we should add a unit test that constructs a descriptor with one suppressed column and one deferred column and exercises the three reconstruction modes.
The default mode - the same we did so far - is "for reading": this means that we not only load the on-disk information into the in-memory descriptor, but we also fixup the suppressed column ranges and the clusters with the info coming from the header extension (deferred columns); this is a required step for reading the rntuple correctly. We then add two additional modes: "for writing" and "raw". "for writing" is the same as "for reading" except it doesn't do the header extension fixup - this mode will be used by the RNTupleMerger to properly write out an incrementally-merged RNTuple. "raw" simply deserializes the on-disk information without doing any fixup.
0264c59
to
cfb2444
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM.
cfb2444
to
07f170b
Compare
The default mode - the same we did so far - is "for reading": this means that we not only load the on-disk information into the in-memory descriptor, but we also fixup the suppressed column ranges and the clusters with the info coming from the header extension (deferred columns); this is a required step for reading the rntuple correctly.
We then add two additional modes: "for writing" and "raw". "for writing" is the same as "for reading" except it doesn't do the header extension fixup - this mode will be used by the RNTupleMerger to properly write out an incrementally-merged RNTuple. "raw" simply deserializes the on-disk information without doing any fixup.
Checklist: